import pandas as pd
import numpy as np
import plotly.express as px
from itables import show
import country_converter as cocoWorkshop Exercise: Emissions Data - Table, Map, and Chart
Introduction
In this exercise, you will work with a CO2 emissions dataset downloaded from Gapminder and produce a report with three tabs: a data table, a line chart, and a choropleth map.
The goal is to roughly replicate the Our World in Data visualization page on consumption-based CO2 emissions.
Be sure to view that page to get an idea of the final product.
Setup
You should have forked and cloned this repository to your local machine.
Now, create and select a virtual environment in VSCode.
Install the following packages:
- pandas
- plotly
- itables
- ipykernel
- jupyter
- country_converter
Download the data from Gapminder by selecting: Environment > Emissions > CO2 Total emissions, then downloading the CSV file into a
datafolder in your repository.
Data Import
Run the following code to import the necessary libraries:
Load in your dataset from gapminder below. View it in your data viewer to get an idea of the structure.
pd.set_option("mode.copy_on_write", True)# Load the data
emissions = pd.read_csv("data\co2_cons.csv")
emissions<>:2: SyntaxWarning:
invalid escape sequence '\c'
<>:2: SyntaxWarning:
invalid escape sequence '\c'
C:\Users\aijay\AppData\Local\Temp\ipykernel_27220\2093159210.py:2: SyntaxWarning:
invalid escape sequence '\c'
| country | 1800 | 1801 | 1802 | 1803 | 1804 | 1805 | 1806 | 1807 | 1808 | ... | 2013 | 2014 | 2015 | 2016 | 2017 | 2018 | 2019 | 2020 | 2021 | 2022 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Afghanistan | 0.002 | 0.002 | 0.002 | 0.002 | 0.002 | 0.002 | 0.003 | 0.003 | 0.003 | ... | 8.84 | 8.290 | 8.850 | 8.480 | 8.820 | 9.320 | 9.850 | 10.200 | 11.2 | 11.7 |
| 1 | Angola | 0.015 | 0.015 | 0.015 | 0.015 | 0.015 | 0.015 | 0.015 | 0.015 | 0.015 | ... | 33.4 | 44.600 | 34.300 | 34.500 | 34.600 | 35.100 | 37.100 | 37.500 | 41.5 | 43.6 |
| 2 | Albania | 0.001 | 0.001 | 0.001 | 0.001 | 0.001 | 0.001 | 0.001 | 0.001 | 0.001 | ... | 6.54 | 6.480 | 5.880 | 5.800 | 6.120 | 5.980 | 5.890 | 5.740 | 6.06 | 5.98 |
| 3 | Andorra | 0.001 | 0.001 | 0.001 | 0.001 | 0.001 | 0.001 | 0.001 | 0.001 | 0.001 | ... | 0.421 | 0.418 | 0.428 | 0.441 | 0.463 | 0.459 | 0.462 | 0.438 | 0.472 | 0.472 |
| 4 | UAE | 0.003 | 0.003 | 0.003 | 0.003 | 0.003 | 0.003 | 0.003 | 0.003 | 0.003 | ... | 236 | 237.000 | 241.000 | 240.000 | 216.000 | 214.000 | 196.000 | 183.000 | 194 | 199 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 189 | Samoa | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | ... | 0.208 | 0.219 | 0.247 | 0.260 | 0.268 | 0.277 | 0.291 | 0.281 | 0.307 | 0.318 |
| 190 | Yemen | 0.006 | 0.006 | 0.006 | 0.006 | 0.006 | 0.006 | 0.006 | 0.006 | 0.006 | ... | 26.8 | 26.000 | 13.700 | 11.000 | 10.900 | 11.000 | 11.500 | 11.700 | 12.8 | 13.3 |
| 191 | South Africa | 0.005 | 0.005 | 0.005 | 0.005 | 0.005 | 0.005 | 0.005 | 0.005 | 0.005 | ... | 334 | 334.000 | 322.000 | 320.000 | 314.000 | 311.000 | 330.000 | 301.000 | 302 | 310 |
| 192 | Zambia | 0.191 | 0.194 | 0.198 | 0.202 | 0.206 | 0.210 | 0.214 | 0.218 | 0.222 | ... | 7.78 | 8.810 | 8.430 | 7.890 | 8.210 | 8.340 | 8.950 | 7.350 | 8.09 | 8.49 |
| 193 | Zimbabwe | 0.065 | 0.065 | 0.065 | 0.065 | 0.066 | 0.066 | 0.066 | 0.066 | 0.067 | ... | 12.3 | 12.900 | 13.500 | 11.900 | 10.400 | 12.500 | 12.000 | 11.600 | 12.6 | 13.1 |
194 rows × 224 columns
Initial Cleaning
In this dataset, some values are given in thousands, with a “k” used to represent the thousands. This will cause problems when we try to make these columns numeric. So we need to clean this. We’ll do this for you, but pay close attention as you might need it for your final project.
First, let’s see the issue:
emissions.query("country == 'China'")[["country", "2020", "2021", "2022"]]| country | 2020 | 2021 | 2022 | |
|---|---|---|---|---|
| 32 | China | 9980.0 | 10.6k | 10.5k |
Notice the letter “k” at the end of “10.6k” as an example.
We can remove the “k” and multiply those values by 1000 with the following code:
for col in ["2021", "2022"]:
has_k = emissions[col].str.contains("k")
values = emissions[col].str.replace("k", "")
emissions[col] = np.where(has_k, values.astype(float) * 1000, values.astype(float))And check that it worked:
emissions.query("country == 'China'")[["country", "2020", "2021", "2022"]]| country | 2020 | 2021 | 2022 | |
|---|---|---|---|---|
| 32 | China | 9980.0 | 10600.0 | 10500.0 |
Table Section
Our goal is to create a table showing emissions for a few selected years and calculate absolute and relative changes.
- Subset the data to include
Country,2000, and2022columns only. - Calculate an “Absolute Change” column as the difference between 2022 and 2000.
- Calculate a “Relative Change” column as the absolute change divided by the 2000 emissions, then multiplied by 100.
# Subset the data to include `country`, `2000`, and `2022` columns only.
table_df = emissions[['country', '2000', '2022']]
# Calculate absolute change as the difference between 2022 and 2000
table_df["Absolute Change"] = table_df['2022'] - table_df['2000']
# Calculate relative change as the absolute change divided by the 2000 emissions, then multiplied by 100
table_df["Relative Change"] = table_df["Absolute Change"]/table_df['2000']*100
# Round to 0 decimal places, and add a % sign to the relative change
table_df["Relative Change"] = table_df["Relative Change"].round(0).astype(str) + "%"Now we can display this as an interactive table with itables:
show(table_df)| Loading ITables v2.4.2 from the internet... (need help?) |
Chart Section
Our goal is to create a line chart from 1990 to 2022 for a few selected countries.
- Melt the original
emissionsdataset so that years become rows. - Filter from 1990 to 2022 only.
- Choose 5 countries of your choice.
- Create a line chart showing emissions over time for the selected countries with Plotly Express.
# Melt the original `emissions` dataset. Your id_vars should be "country", your var_name should be "year" and your value_name should be "emissions".
emissions_long = emissions.melt(id_vars='country', var_name='year', value_name="emissions")
# Convert year to numeric using pd.to_numeric
emissions_long["year"] = pd.to_numeric(emissions_long['year'], errors='coerce')
# Convert emissions to numeric using pd.to_numeric. Here, we also convert dashes to the minus sign
emissions_long["year"] = pd.to_numeric(emissions_long['year'], errors='coerce')
# Query for years between 1990 and 2022 (that is 1990, 1991, ..., 2021, 2022)
emissions_long_1990_2022 = emissions_long.query('year >= 1990 and year <= 2022')
# Query for 5 countries (adjust these to any countries you like)
emissions_long_subset = emissions_long_1990_2022.query("country in ['Nigeria', 'UAE', 'Canada', 'Japan', 'China']")
# Create line chart. Year should be on the x-axis, emissions on the y-axis, and color should be by country.
fig_chart = px.line(emissions_long_subset, x='year', y='emissions', color='country')
fig_chart.show()Mapping Section
This part is done for you.
Goal: Create a choropleth map showing global emissions from 1990 to 2022.
This will be animated by year.
- Ensure each country has a 3-letter ISO code. We’ll use
country_converterfor that. - Create a map with
px.choroplethand useanimation_frameto show changes over time.
emissions_long_1990_2022 = emissions_long_1990_2022.replace({"UAE": "United Arab Emirates"})# Convert country names to ISO3 codes
emissions_long_1990_2022["country_code"] = coco.convert(
emissions_long_1990_2022["country"], to="ISO3"
)
fig_map = px.choropleth(
emissions_long_1990_2022,
locations="country_code",
color="emissions",
hover_name="country",
animation_frame="year",
title="Global CO2 Emissions (1990-2022)",
)
fig_map.show()Final Tabset
Below, we place our results into a tabbed interface.
show(table_df)| Loading ITables v2.4.2 from the internet... (need help?) |
fig_chart.show()fig_map.show()Deploying to GitHub Pages
As a final step, you should follow the steps outlined in the prework to deplioy your report to GitHub Pages. You will be asked to share a link to your report in the course portal